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PREFACE 


This  document  describes  the  results  of  an  effort  at  the  Electro-Optical 
Terminal  Guidance  Branch,  Guided  Weapons  Division,  U.S.  Air  Force  Arnament 
Laboratory.  Eel in  AFB,  Florida. 

The  work  reported  herein  was  performed  under  Contract  749620-82-0C-0035 
during  the  period  15  Nay  1984  to  31  July  1984  by  the  author,  Dr.  Robert  R. 
Kallman.  while  visiting  the  Armament  Laboratory  as  a  Southeastern  Center  for 
Electrical  Engineering  Education  (SCEEE)  Postdoctoral  Fellow. 

The  author  would  like  to  thank  the  Air  Force  Systems  Command,  the  Air 
Force  Office  of  Scientific  Research,  and  SCEEE  for  providing  him  with  the 
opportunity  to  spend  a  very  worthwhile  and  interesting  10  weeks  at  the 
Armament  Laboratory.  The  author  would  like  to  acknowledge  in  particular  the 
Electro-Optical  Terminal  Guidance  Branch  and  the  Image  Processing  Laboratory 
for  their  hospitality  and  excellent  working  conditions.  Special  thanks  go  to 
Captain  James  Riggins  and  Steve  Butler  for  introducing  this  present  research 
topic  to  the  author  and  patiently  explaining  its  many  aspects  to  him. 
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SECTION  I 


INTRODUCTION 

The  work  reported  in  this  document  is  the  result  of  a  mathematical 
approach  to  an  engineering  problem.  The  work  is  described  in  mathematical 
language.  The  purpose  of  this  introduction  is  to  describe  the  engineering 
problem  being  attacked,  thus  explaining  the  motivation  for  the  mathematical 
work,  and  to  define  enough  terms  so  that  the  reader  has  the  proper  mindset  to 
appreciate  what  is  reported  herein. 

A  large  body  of  work  has  been  dedicated  to  shape  recognizing  or  object 
identification.  One  particular  approach  to  this  problem  is  optical 
correlation.  This  is  usually  attempted  with  a  discriminant  function, 
implemented  in  a  holographic  optical  filter,  which  contains  sufficient 
information  about  the  object  so  that  an  image  of  the  object  can  be  processed 
by  the  filter.  If  the  object  sought  is  in  the  filter  input  plane,  a  large 
correlation  peak  should  occur  in  the  filter  output  plane.  An  excellent 
summary  of  the  how  this  can  be  accomplished  with  holography  is  found  in  the 
survey  article  by  Casasent  and  Caulfield,  Reference  1. 

Practical  problems  which  appear  include  how  to  design  the  discriminant 

function  to  contain  sufficient  information,  how  to  implement  the  discriminant 
function  to  make  a  tangible  filter,  and  how  to  enhance  filter  efficiency.  The 
discriminant  function  design  is  in  many  respects  a  mathematical  problem  and  is 
addressed  in  this  report. 

In  the  approach  relevant  to  this  report,  the  discriminant  function  is 


constructed  by  judiciously  adding  information  from  selected  aspects  of  the 
object  of  interest,  thus  creating  the  SDF,  Reference  2.  This  report  addresses 
how  to  best  choose  the  information  used  to  create  the  SDF. 

The  input  data  is  imagery.  The  format  for  the  imagery  is  usually  a  512 
pixel  by  512  pixel  scene.  In  order  to  use  this  information,  it  is  convenient 
to  break  out  the  512  by  512  *  262,144  pixels  into  a  string  of  integers  on  a 
magnetic  tape.  Mathematically,  therefore,  one  can  think  of  a  scene  as  a 
vector  with  262,144  components.  Ideas  such  as  vector  inner  products  then 
follow  naturally. 

As  filters  are  designed,  criteria  must  be  established  by  which  the 
quality  of  the  filters  can  be  measured.  One  of  the  unique  aspects  of  the  work 
reported  herein  is  the  fact  that  the  filter  performance  is  quantified. 

Testing  each  filter  can  be  done  optically,  but  for  this  effort  it  was 
much  more  efficient  to  simulate  all  the  optics  on  a  digital  computer.  This 
was  done  in  the  E-0  Terminal  Guidance  Branch  Image  Processing  Laboratory, 
using  its  VAX  11-750. 

A  glossary  is  included  in  the  report. 


SECTION  II 


OPTICAL  FILTERING  AND  SDF  BASICS 

Imagine  a  two-dimensional  infrared  image  f(x1>x2)  of  a  scene  which 

contains  an  object  of  interest.  Consider  the  following  operations  on  f.  Map 

f(x. ,x„)  to  its  Fourier  transform  F(f)(k  ,k  ),  multiply  by  the  Fourier 
12  12 

$ 

transform  of  a  suitable  function  F(h  Hk^.kg).  take  the  inverse  Fourier 

# 

transform  of  the  product  to  obtain  the  convolution  (f  h  )(Xj,Xg).  and  measure 
the  magnitude  |f  h*|2(xltx2).  Here,  h^fy^y,,)  =  M-y^.-y^,  so  (f  h*)(Xj,x2) 
may  be  viewed  as  the  inner  product  of  f  and  the  translate  of  h  by  (-x  ,-x  ). 
All  of  these  operations  can  be  carried  out  almost  instantaneously,  for  Fourier 
transforms  and  their  inverses  can  be  carried  out  by  lenses,  and  the  important 
multiplication  and  filtering  step  can  be  carried  out  by  passing  the  light  wave 
F ( f )  through  a  suitable  hologram  transparency,  Reference  1,  incorporating 
information  about  F(h*).  If  h,  the  synthetic  discriminant  function  or  SDF,  is 
suitably  constructed  and  scaled,  the  objects  of  interest  should  be  centered  at 
the  points  (x^.Xg)  such  that 

|f  h#|2(x1.x2)  =  1.  (l) 

In  reality  f  will  probably  be  a  512  by  512  pixel  image,  and  h  will  be  32 
by  32  pixels  in  size.  One  can  think  of  the  filter  as  operating  by 
instantaneously  placing  translates  of  h  all  over  f,  taking  the  corresponding 
inner  products,  and  indicting  those  places  where  the  magnitude  of  the  inner 
product  is  large.  These  should  be  the  places  where  objects  of  interest  are 
located.  This,  in  a  very  rough  form,  is  the  optical  matched  filtering  process 


via  an  SDF. 


a 


The  correct  construction  of  h  is  obviously  of  the  utmost  importance  if 

this  scheme  is  to  be  successful.  An  early  attempt  along  these  lines  was  made 

by  simply  taking  a  number  of  transparencies  of  the  object  of  interest,  at  a 

variety  of  aspects  and  angles,  and  overlaying  them,  Reference  3.  Hence,  in 

these  early  attempts  one  started  with  m  images  fj . f  and  let  h  =  f  +  ... 

+  f  .  This  is  a  good  idea  if  f .  f  are  mutually  orthogonal,  but  in 

m  1  m 

general  they  are  not. 

In  the  past  few  years  a  generalization  of  this  original  approach  was 

suggested  by  Caulfield  and  Maloney,  Reference  4,  and  by  Hester,  Casasent,  et 

al . ,  References  5  -  10,  who  proposed  to  take  a  number  of  pictures  f_ . f  of 

1  m 

the  object  interest  and  to  choose  h  to  be  a  suitable  linear  combination  of 

f , . f  .  So  in  theory 

1  m 

h "  Vi  *  •••  *  V.  121 

for  some  suitable  choice  of  constants  a . a  .  Thinking  of  the  f. 

1  m  l 

(1  <  i  <  m)  as  vectors  in  some  high  dimensional  (e.g.,  for  images  that  are  512 

pixels  by  512  pixels,  the  dimension  of  the  space  is  262,144)  Euclidean  space. 

h  is  expressed  as  a  linear  combination  of  the  vectors  f, , . ...f  .  To  determine 

1  m 

the  a^  (1  <  i  <  m),  make  the  ad  hoc  assumption  that  <h,f^>  =  1  (1  <  j  <  m)  and 

use  the  bilinearity  of  the  inner  project  to  obtain 

1  =  <h .  f  >  =  a  <f  ,  f  >  +  ...  +  a  <f  ,f  >.  (3) 

j  1  1  j  m  m  j 

where  <.,.>  denotes  the  inner  project  between  vectors.  If  the  m  x  m  matrix 

( < f A . f j > )  is  nonsingular,  as  it  most  probably  is  for  quite  different  images 

f. . f  .  then  there  is  a  unique  choice  for  the  a.  (1  <  i  <  m),  and  they  can 

i  m  l 

be  determined  easily  by  solving  a  system  of  m  equations  in  m  unknowns. 


•wtv  •*  / 


The  notion  has  persisted  that  one  should  not  use  ail  of  the  original 

iaages  f . ,f  to  manufacture  the  SDF  h,  but  instead  should  use  a  subset  p 

1  m 

(p  less  than  m)  of  them,  usually  selected  by  some  sort  of  orthogonal ization 
procedure.  Reasons  given  for  not  using  all  of  the  images  include  problems 
with  correlating  on  clutter.  Given  that  this  rather  dubious  notion  has  some 
merit,  the  question  remains  to  carefully  formulate  how  to  choose  the  p  best 


from  all  m  iaages. 


SECTION  III 


OBJECTIVES 

The  objective  of  the  author  during  the  SCEEE  fellowship  was  to  design 

from  scratch  a  variety  of  programs  to  generate  the  best  possible  SOP's  from  a 

training  set  of  images  and  to  compare  the  SDP's  to  each  other.  The  training 

set  (a  set  of  36  images  to  be  used  to  construct  the  SDF's)  consisted  of  512  by 

512  pixel  8-bit  infrared  tank  images,  but  pixel  values  outside  of  rows  200  to 

400  inclusive  were  all  zero.  These  images  were  dirty,  in  the  sense  that  they 

did  not  consist  of  tanks  in  a  zero  background,  as  would  be  desired,  but  were 

images  with  a  very  bright  background  included.  The  images  had  been  previously 

edge-enhanced  and  biased.  Two  typical  members  of  the  tank  imagery  with  the 

background  removed  are  shown  in  Figure  1.  An  aspect  of  one  type  of  SDF  (CSDF1 

-  c.f.  the  next  section)  made  from  this  training  set  is  shown  in  Figure  2. 

The  training  set  images  were  furnished  on  a  computer  magnetic  tape,  each  image 

2 

consisting  of  a  string  of  512  =  262,144  integers  (each  integer  with  a  value 
between  0  and  255),  representing  intensity  levels  at  each  pixel  of  the  image. 
The  guiding  principle  throughout  the  effort  was  that  notions  such  as  good  or 
best  be  determined  by  concrete  numerical  criteria.  In  general,  the  programs 
try  to  drive  a  least  squares  error  down  to  0.  The  computing  was  done  on  a 
VAX  750/VMS  3.5  in  the  E-0  Terminal  Guidance  Branch  Image  Processing 


Laboratory. 


Figure  1.  Two  Tanks  in  the  Training  Set 
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SECTION  IV 


THE  PROGRAMS  AND  NUMERICAL  RESULTS 

Let  f .  f,.  be  the  tank  images,  thought  of  as  vectors  in  a  high 

1  36 

o 

dimensional  (512  -  262,144)  Euclidean  space.  The  following  concepts  and 

programs,  with  minor  modifications,  apply  to  any  number  of  images,  not  Just 
36,  and  of  any  size,  not  just  512  by  512.  In  all  of  the  calculations  it 
fortuitously  turns  out  that  the  only  thing  one  really  needs  to  know  about  the 
fj's  is  the  symmetric  36  by  36  matrix  ( <f ^ . f j > ) .  which  should  be  calculated 
and  stored  first.  As  a  measure  of  error  made  by  a  potential  SDF  h,  the  least 
squares  error  was  chosen: 

LSE(h)  -  |<h.f1>  -  1  |2  +  |<h,f2>  -  1|2  ♦  ...  ♦  |<h.f36>  -  l|2.  (4) 

This  procedure  is  extremely  plausible,  for  such  a  measure  of  error  has  proven 
useful  over  the  past  200  years  in  astronomy  and  statistics. 

The  following  is  a  list  of  some  of  the  SDF's  calculated,  giving  their 
method  of  calculation  and  the  least  squares  error  for  each.  They  are  named 
and  numbered  in  the  order  in  which  they  were  calculated. 

SDF1:  This  is  the  theoretically  perfect  SDF  and  is  a  linear  combination 
of  all  36  tank  images.  So  SDF1  =  a^  +  a2f2  +  ...  +  a36^36’  for  sonie  choice 
of  constants  a, . a„_.  The  a.'s  must  satisfy  the  36  equations 

1  OD  J 

ai<fl,fi>  +  VW  +  •••  +  a36<f36’fi>  =  U  (5) 

for  i  between  1  and  36.  They  are  easily  calculated  by  Gaussian  elimination. 

The  resulting  LSE(SDFl)  =0.0. 

SDF2:  This  SDF  is  a  linear  combination  of  6  tank  images.  It  was 


calculated  by  exhaustively  checking  all  (36  choose  6)  »  1,947,792  subsets  of 
the  36  tank  inages,  and  for  each  fixed  subset  of  6,  calculating  that  linear 
conbination  which  nakes  the  LSE  as  small  as  possible.  It  is  easy  to  check 


that  LSE(h)  is  a  convex  function  of  h,  so  a  local  nininun  for  LSE(h)  is  a 
global  nininun  for  LSE(h).  If  the  f l  are  independent  vectors  and  h  is 
restricted  to  be  a  linear  conbination  of  then,  it  is  easy  to  check  that  LSE(h) 
becomes  uniformly  unbounded  as  ||hj|  increases,  so  a  global  nininun  for  LSE(h) 
exists.  Suppose  g, ,...ge  is  a  subcollection  of  6  out  of  the  36  tank  images. 

We  would  like  to  find  nunbers  a . a_  so  that  h  *  a,g,  +  ...  +  a„g„ 

1  O  XX  o  o 

minimizes  LSE(h)  over  all  possible  choices  of  the  a^.  The  above  reasoning 

indicates  that  a  nininun  exists  and  is  assumed  when  the  partial  derivative  of 

LSE(h)  with  respect  to  each  is  0.  Doing  this  for  each  a^  gives  us  6  linear 

equations  which  must  be  satisfied.  They  are 


■u*l  *  •••  *  *16a6  ■  V 


(6) 


where 


■u  ‘  <«rV<W  *  <eiV<f2V 


<gl'f36><f36‘eJ> 


(71 


and 


bi  =  ^i’V  +  <grV  +  •••  +  <gi'f36>  (8> 

Given  the  a^'s  and  b^'s  as  above,  a  little  algebra  shows  that 

LSE(h)  =  36  -  a, b  -  ...  -  a.be.  (9) 

11  DO 

So  the  program  to  compute  SDF2  proceeds  as  follows:  select  6  out  of  the  36 
tank  inages  g . g_,  find  a . ae  by  solving  one  set  of  6  equations  in  6 

lb  lb 

unknowns,  and  compute  36  -  a,b.  -  ...  -  a„b„.  Select  that  subset  of  6  which 

ll  ob 

makes  this  last  number  as  small  as  possible,  and  manufacture  SDF2  from  them  by 

computing  a,g,  +  ...  +  a„g. .  The  images  selected  by  SDF2  were  4.  12.  16.  24, 
11  b  b 


29,  and  32.  The  resulting  LSE(SDF2)  =  0.061511. 


SDF3 :  This  SDF  is  a  linear  combination  of  6  tank  images.  It  was 
calculated  in  a  step-by-step  orthogonal ization  procedure  from  SDF1 .  Let  g^  be 
that  tank  image  so  that  the  orthogonal  projection  of  SDF1  onto  the  line 
spanned  by  g^  is  largest.  Take  the  orthogonal  projection  of  SDF1  and  of 

fj . f36  onto  the  orthocomplement  of  g1  to  obtain  SDF1*  and 

(only  35  of  which  are  now  nonzero),  and  repeat  the  process  5  more  times.  Keep 
track  of  the  index  chosen  each  time  to  get  the  6  desired  original  images 

g, . g„.  The  calculations  are  easy,  for  g,  is  that  image  f.  so  that  angle 

between  S0F1  and  f^  is  as  small  as  possible.  That  is,  g^  is  that  f^  so  that 
|<SDFl.f1>|/| | SDF1 I |- | |fj j  (10) 

is  as  large  as  possible.  Once  g^  is  chosen,  the  iteration  is  simple  for 

<fi‘.fj'>  =  <f1.fJ>  -  <f1.g1><fj.g1>/||g1||2.  (11) 

and 

<SDFl,f11>  =  <SDF1 • f j>  -  <SDFl.g1><fi,g1>/| |gj |2.  (12) 

So  repetition  is  easy.  The  images  selected,  in  the  order  in  which  they  were 
chosen,  are  17,  12,  4,  29,  16,  and  24.  SDF3  is  then  that  linear  combination 
of  these  images  which  minimizes  the  LSE.  The  resulting  LSE(SDF3)  *  0.085728. 

SDF3A:  This  SDF  is  a  linear  combination  of  6  tank  images.  The  choice  of 
the  images  is  done  in  the  same  manner  as  was  done  for  SDF3,  so  the  images  are 
the  same.  However,  SDF3A  was  chosen  to  be  the  theoretically  perfect  SDF  on 
these  6  images.  It  is  simple  to  check  that  SDF3A  is  the  orthogonal  projection 
of  SDF1  onto  the  subspace  spanned  by  the  6  selected  images.  LSE(SDF3A)  * 
0.271316. 

SDF4 :  This  SDF  is  a  linear  combination  of  6  tank  images.  To  find  the 
images,  an  exhaustive  search  of  all  (36  choose  6)  =  1.947,792  subsets  of  6 


tank  images  is  made.  The  subset  of  6  chosen  is  that  subset  such  that  the 
orthogonal  projection  of  the  theoretically  perfect  SDF1  onto  their  span  is  as 
large  as  possible.  As  in  SDF3A,  this  projection  is  simple  to  calculate,  for 
it  coincides  with  the  theoretically  perfect  SDF  made  from  the  6  working 
images.  SDF4  is  then  that  linear  combination  of  the  images  chosen  which 
minimizes  the  LSE .  The  images  chosen  were  4,  12,  16,  17,  24,  and  29,  the  same 
images  chosen  by  SDF3.  This  is  a  fluke  (c.f.  CSDF3  and  CSDF4,  to  be  discussed 
later).  LSE(SDF4)  =  0.085728. 

SDF4A:  SDF4A  stands  in  the  same  relation  to  SDF4  as  SDF3A  does  to  SDF3. 
LSE(SDF4A)  =*  0.271316. 

SDF5 :  This  SDF  was  calculated  in  the  same  manner  as  was  SDF2 ,  except  that 
SDF5  is  a  linear  combination  of  5  images.  The  images  chosen  were  4,  12,  17. 
18,  and  29.  Notice  that  the  best  5  images  are  not  a  subset  of  the  best  6 
images.  LSEj(SDF5)  *  0.080739. 

SDF6 :  This  SDF  was  calculated  in  the  same  manner  as  were  SDF2  and  SDF5 , 
except  that  SDF6  is  a  linear  combination  of  4  images.  The  images  chosen  were 
4,  12,  17,  and  29.  LSE(SDF6)  =  0.103963. 

SDF7 :  This  SDF  is  a  linear  combination  of  6  tank  images.  They  are  chosen 
in  a  step-by-step  orthogonalization  procedure.  Roughly  speaking,  the  first 
image  chosen  is  that  one  which  contains  as  much  information  as  possible  about 
all  of  the  other  images.  The  numerical  measure  for  this  information  is  taken 
to  be  the  sum  of  the  squares  of  the  cosines  of  the  angles  between  all  of  the 
images.  So  the  first  image  chosen,  ,  is  that  image  f ^ ,  so  that 


is  as  large  as  possible.  All  vectors  are  now  projected  onto  the 


orthocomplenent  of  ,  as  in  the  calculation  of  S0P3,  and  the  process  is 
repeated  5  more  times.  The  images  selected,  in  the  order  in  which  they  were 
chosen,  are  16,  28,  6,  22,  11,  and  36.  LSE(SDF7)  =  0.258666. 

SDF7A:  SDF7A  stands  in  the  same  relation  to  SDF7  as  SDF3A  does  to  SDF3. 
LSE(SDF7A)  -  0.753411. 

SDF8 :  This  SDF  stands  in  the  same  relation  to  SDF7  as  SDF4  does  to  SDF3. 
This  was  not  run,  for  the  required  computation  time  was  estimated  to  exceed 
more  than  50  hours  of  CPU  time. 

SDF8A:  SDF8A  stands  in  the  same  relation  to  SDF8  as  SDF3A  does  to  SDF3. 
It  was  not  run  for  the  same  reason  that  S0F8  was  not  run. 

The  results  of  the  above  SDF  fabrications  are  tabulated  in  Table  1. 

Suppose  one  chose  to  make  an  SDF  from  the  images  1,  2,  5,  16,  19,  and  36 
The  very  best  SDF  that  can  be  made  with  these  images  has  LSE  -  0.183932.  If 
one  took  these  same  6  images  and  took  the  orthogonal  projection  of  SDF1  onto 
their  span,  and  used  this  orthogonal  projection  as  an  SDF,  the  resulting 
LSE  »  0.247391. 

Some  concern  was  expressed  that  the  SDF's  might  be  correlating  on  the 
clutter  in  the  background  and  not  on  the  tanks  themselves.  For  this  reason, 
and  because  working  with  512  by  512  images  consumed  inordinate  amounts  of  CPU 
time,  the  tank  images  were  now  extracted  from  the  background  and  placed  into 
256  by  256  arrays,  using  DeAnza  image  processing  equipment.  All  previous 
calculations  were  performed  on  this  new  data.  The  results  are  summarized  in 
Table  2.  In  general,  CSDF-  was  manufactured  in  exactly  the  same  manner  as 
SDF-,  except  that  the  clean  images  were  used  instead  of  the  dirty  images. 
Figure  2  is  a  picture  of  the  larger  pixels  in  a  biased  version  of  CSDF1  - 
biased  to  make  all  of  its  entries  nonnegative. 


13 


TABLE  1.  SUMMARY  OF  THE 

SDF  CALCULATIONS 

Technique 

Images  chosen 

LSE 

SDF1 

Linear  combinations  of 
all  36  images. 

All  36 

0.0 

SDF2 

Calculate  36  choose  6. 

4.12,16,24,29.32 

0.061511 

SDF3 

Calculate  36  chose  6, 
but  choose  each  of  the  6 
according  to  maximum  value 
of  | <SDF1 . f ±> | / | | SDF1 | | * | | f ± | | - 

17.12,4,29,16.24 

0.085728 

SDF3A 

Same  as  SDF3,  but  SDF3A 
is  chosen  to  be  the 
theoretically  perfect  SDF 
on  the  6  chosen  images. 

17,12,4,29,16,24 

0.271316 

SDF4 

Calculate  36  chose  6, 
choose  to  maximize  the 
orthogonal  projection  of 

SDF1  onto  their  span. 

4,12,16,17,24,29 

0.085728 

SDF4A 

4,12,16,17,24,29 

0.271316 

SDF5 

Calculate  36  choose  5. 

4,12,17,18.29 

0.080739 

SDF6 

Calculate  36  choose  4. 

4,12,17,29 

0.103963 

SDF7 

16,28.6,22,11,36 

0.258666 

SDF7A 

16,28,6,22,11,36 

0.753411 

SDF8 

Not  Run 

SDF8A 

Not  Run 

CSDF1 

CSDF2 

CSDF3 

CSDF3A 

CSDF4 


CSDF8 

CSDF8A 


TABLE  2.  SUMMARY  OF  THE  CSDF  CALCULATIONS 

Images  chosen 

This  SDF  is  a  linear  combination  of  all  36 
clean  tank  images. 


14, 

16. 

24, 

29. 

32. 

and 

33 

17. 

16, 

33. 

14. 

32. 

and 

12 

17, 

16. 

33. 

14, 

32. 

and 

12 

12. 

14, 

16, 

17. 

31. 

and 

33 

(Note  that  the  images  used  to  manufacture 
CSDF4  do  not  coincide  with  the  images  used 
to  manufacture  CSDF3.) 


CSDF4A 

12. 

14, 

16. 

17. 

31. 

CSDF5 

14, 

17. 

24, 

29. 

and 

CSDF6 

12. 

17. 

29. 

and 

31 

CSDF7 

31, 

28, 

17, 

24, 

1,  ; 

CSDF7A 

31, 

28, 

17, 

24, 

1,  . 

(The  lesson  to  be  learned  from  this 
computation  is  that  given  a  collection  of 
images,  one  should  do  the  best  job  one  can 
in  manufacturing  an  SDF  from  them.) 

not  computed 

not  computed 


0.870515 

1.289280 

2.731533 

1.414930 


2.160517 
1.046511 
1 . 235655 
1.420816 
6.821821 


SECTION  V 


CONCLUSIONS  AND  RECOMMENDATIONS 

The  numbers  in  Tables  1  and  2  speak  for  themselves.  Assume  that 
minimizing  least  squares  errors  is  a  reasonable  approach  to  quantifying  the 
goodness  of  an  SOP,  and  suppose  that  it  is  desired  to  make  an  SDP  out  of  some 
small  subset  of  all  the  images,  no  matter  how  dubious  this  concept  may  seem. 
The  computations  summarized  in  Tables  1  and  2  strongly  suggest  that  a  least 
squares  choice  and  manufacture  of  an  SDF  on  5  Images  always  does  better  than 
any  orthogonalization  procedure  on  6  images  (and  usually  much  better),  and 
that  a  least  squares  choice  and  manufacture  of  an  SDF  on  4  images  usually  does 
better  than  most  orthogonalization  procedures  on  6  Images  (and  sometimes  much 
better).  They  also  strongly  suggest  that  the  worst  orthogonalization 
procedure  is  the  one  which  tries  to  find  6  images  which  contain  the  most 
information  about  the  other  images  and  then  take  one's  SDF  to  be  the 
theoretically  perfect  SDF  manufactured  from  these  6  images.  It  is  doubtful 
that  the  time-consuming  computation  of  SDF8,  SDF8A,  CSDF8,  or  CSDF8A  would 
change  these  empirical  conclusions. 

While  the  ideas  described  to  pick  the  best  p  out  of  m  images  to 

manufacture  an  SDF  work  fairly  well  if  p  and  m  are  not  too  large,  they  will 

not  work  in  a  practical  sense  if  m  =  100  and  p  =  10  say,  for  then  to  calculate 

the  analogue  of  SDF2  would  involve  finding  solutions  to  (100  choose  10), 
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around  1.731  x  10  ,  sets  of  10  equations  in  10  unknowns,  a  very  large  task 

indeed.  There  might  be  fairly  short  computational  procedures  for  finding 


results  close  to  the  theoretically  best.  Perhaps  the  random  selection  and 
testing  of  10  scenes  at  a  time,  combined  with  some  sort  of  gradient  technique, 
would  work  fairly  well. 

One  can  generalize  the  LSE  estimator  by  replacing  each  summand  by 
o 

w1-|<h,fi>  -  Vj. > |  .  Think  of  Wj  as  a  weight.  Usually  Vj  will  take  on  the 
value  0  or  1 ,  but  it  can  be  any  value  one  desires.  This  should  have  some  uses 
and  should  be  tested  in  an  appropriate  setting. 

The  numerical  experiments  described  in  Section  IV  should  be  tried  on 

training  sets  of  images  which  have  been  edge-enhanced  and  then 

energy-normalized.  Recall  that  edge-enhancement  normally  involves  starting 

with  an  image  f,  mapping  f  to  its  Fourier  transform  F(f),  deleting  a  suitable 

disk  containing  the  origin  from  F(f),  and  mapping  the  result  back  to  image 

space  by  an  inverse  Fourier  transform.  Most  practitioners  in  image  processing 

insist  on  this.  The  simple  example  usually  given  is  that  if  this  is  not  done. 

then  any  (uniformly  filled  in)  circle  would  correlate  quite  well  with  a 

(uniformly  filled  in)  square  of  roughly  the  same  size,  even  though  they  are 

quite  different  objects.  This  process  also  has  a  considerable  technical 

advantage,  for  if  an  image  f  has  already  been  edge-enhanced  as  above,  then 

2 

F ( f )  vanishes  at  (0,0)  -  i.e.,  the  integral  of  f  over  R  vanishes.  Notice 

that  if  an  SDF  h  is  made  up  of  a  linear  combination  of  such  images,  it  too  has 

the  same  property.  Furthermore,  any  biasing  done  to  h  then  does  not  change 
its  correlations  with  any  zero  mean  image.  To  see  this,  recall  that  a  biasing 

of  h  involves  replacing  h  by  h  +  h' ,  where  h'  is  a  vector  all  of  whose 

components  have  the  same  constant  value,  say  c.  But  if  g  is  any  zero  mean 
image,  then  <(h  +  h').g>  =  <h,g>  +  <h' ,g>  =  <h,g>,  for  <h' ,g>  =  0  since  it  is 


equal  to  c  times  the  sum  of  the  pixel  values  in  g.  Notice  that  grave  errors 
may  be  made  by  blithely  biasing  SDF's  without  taking  into  consideration  the 
mean  of  g,  for  then  <h' ,g>  may  be  quite  large.  There  is  a  subtle  but 
potentially  very  serious  problem  in  this  circle  of  ideas.  Since  the  SDF  will 
be  made  from  edge-enhanced  images,  32  by  32  pixels  in  size,  it  will  be  looking 
for  edge-enhanced  images,  32  by  32  pixels  in  size.  But  an  edge-enhanced  512 
by  512  image  does  not  have  in  general  each  32  by  32  subscene  edge-enhanced,  in 
the  sense  that  the  sum  of  the  pixel  values  in  this  subscene  equals  0.  For 
suppose  f  is  an  image  such  that  F(f)  =  0  in  a  disk  about  the  origin  in  Fourier 
transform  space.  Let  B  be  a  square  box  in  image  space  and  let  Ifi  be  its 
characteristic  function;  i.e.,  I_  is  1  at  points  in  the  box  and  is  0  at  points 

O 

outside  the  box.  If  fl0  were  edge-enhanced,  then  F(flg)(0,0)  =  0.  But 
F( f I  ) (0,0)  is  the  integral  of  f  over  B.  which  certainly  may  be  nonzero  even 

O 

2 

if  the  integral  of  f  over  all  of  R  is  0.  For  a  concrete  but  somewhat 
artifical  example  of  this,  suppose  F(f)  equals  F ( I _ )  outside  of  the  deleted 

D 

disk.  Then  F(fID)(0,0)  =  (F(f )*F(I_) ) (0,0)  =  <F(f).F(f)>,  a  positive  number. 

D  D 

This  computation  does  show  that  the  more  the  original  scene  is  edge-enhanced, 
the  smaller  is  F(fl_)(0,0),  but  how  much  edge-enhancement  is  enough  to  avoid 

D 

serious  errors  in  the  correlation  process?  These  errors  might  be  especially 
pronounced  if  one  is  using  a  biased  SDF. 

The  dirty  tank  images  had  been  edge-enhanced  as  above  and  then  biased,  so 
that  all  their  entries  were  nonnegative,  and  then  discretized  into  256  equal 
parts  -  hence  the  bit  streams  which  appeared  on  the  data  tape.  One  way  to 
come  close  to  recapturing  the  original  edge-enhanced  images  would  be  to  take 
the  dirty  images,  compute  the  average  pixel  value  over  all  pixels  between  rows 


200  to  400  inclusive,  and  subtract  this  average  pixel  value  from  the  same 
pixels.  But  then  the  DeAnza  image  processing  equipment  could  not  have  been 
used  to  outline  the  images  and  toss  out  the  clutter  in  the  background.  A 
somewhat  inexact  but  homefully  fairly  reasonable  way  out  of  this  conundrum  is 
to  take  each  of  the  clean  images,  compute  the  average  nonzero  pixel  value,  and 
subtract  this  average  from  each  of  the  nonzero  pixels,  leaving  those  pixels 
with  0  values  unaltered. 

Recall  that  energy-normalization  is  equivalent  to  replacing  each  f^  by 
f  /||f  | j.  Many  practitioners  in  image  processing  insist  on  this.  The  usual 
reason  given  is  to  reduce  the  climatic  effects  in  which  the  images  are 
located.  There  are  at  least  two  reasons  why  this  practice  should  be  done  with 
a  certain  amount  of  caution.  Since  the  SDF  will  be  made  from 
energy-normalized  images.  32  by  32  pixels  in  size,  it  will  be  looking  for 
energy-normalized  images,  32  by  32  pixels  in  size.  But  an  energy-normalized 
512  by  512  image  certainly  does  not,  in  general,  have  each  32  by  32  subscene 
proportionally  energy-normalized.  If  the  object  sought  is  the  brightest 
object  in  the  512  by  512  scene,  then  energy-normalizing  the  entire  image  will 
leave  the  object  subscene  more  than  proportionally  energy-normalized,  and  no 
harm  will  result  if  one  is  searching  for  a  correlation  peak.  But  if  there  is 
a  much  brighter  object,  say  a  fire,  in  the  upper  left  hand  corner  of  the 
image,  and  the  object  sought  is  in  the  lower  right  hand  corner,  then 
energy-normalizing  the  entire  scene  perhaps  will  make  the  image  of  the  object 
sought  so  faint  as  to  be  useless.  Furthermore,  the  Schwarz  inequality  implies 
that  the  SDF  will  have  the  largest  inner  product  with  unit  vectors  t>.a*.  do  not 
look  like  the  objects  sought,  but  instead  look  like  the  SDF  itself.  This 


difference  might  be  quite  pronounced.  For  the  SDF  process  to  work,  without 
further  processing,  one  must  make  an  act  of  faith  that  there  are  no  real 
objects  which  look  more  like  the  SDF  than  the  sought  for  images  themselves. 


The  training  set  images  should  first  be  edge-enhanced,  and  then 
energy-normalized.  Note  that  simple  examples  show  that  edge-enhancing  and 
energy-normalizing  are  not  commutative  operations.  For  example,  suppose  an 
image  in  its  right  half  is  uniformly  bright  with  fuzzy  edges  and  contains  a 
faint  object  with  sharp  edges  in  its  left  half.  First  energy-normalizing  and 
then  edge-enhancing  would  destroy  the  left  hand  object  and  leave  an  empty 
scene,  while  first  edge-enhancing  and  then  energy-normalizing  would  leave  a 
sharp  image  of  the  object  on  the  left  and  nothing  on  the  right. 

The  author  knows  of  no  scientifically  unimpeachable  reason  why  the 
theoretically  perfect  SDF  should  not  be  used,  instead  of  one  made  from  a  small 
number  of  pictures,  no  matter  how  they  are  chosen.  At  first  glance  it  seems 
implausible  that  one  can  do  a  better  job  by  throwing  away  information. 

Perhaps  repeating  the  experiments  of  Section  IV  on  edge-enhanced  and  then 
energy-normalized  training  sets  will  shed  light  on  this  important  issue. 

Even  if  SDF's  made  from  a  small  number  of  images  have  lower  correlation 
with  clutter,  there  still  might  be  several  ways  to  enhance  the  theoretically 
perfect  SDF.  Notice,  for  instance,  that  the  number  of  theoretically  perfect 
SDF’s  is  enormous.  If  h  is  a  theoretically  perfect  SDF  (i.e.,  if  <h,fj>  =  1, 

1  <  1  <  »),  then  one  can  add  to  h  any  vector  which  is  in  the  orthocomplement 
of  the  fj's  and  still  obtain  another  theoretically  perfect  SDF.  Furthermore, 
they  all  arise  in  this  manner.  So  if  the  f ^ ' s  are  d  by  d  images,  then  the  set 

d  2 

of  theoretically  perfect  SDF's  is  a  hyperplane  in  Ra  of  dimension  d  -  m. 

This  gives  one  hope  that  superior  SDF's  exist. 
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GLOSSARY  OF  TERMINOLOGY 


Edge-Enhance  -  A  technique  used  to  reduce  the  low  spatial  frequencies  in  an 
image,  so  that  only  high  spatial  frequencies  (small  objects  and  edges  of  large 
objects)  remain  in  the  image.  This  is  discussed  at  some  length  in  Section  V. 

Energy-Normal ize  -  A  technique  used  to  adjust  the  total  energy  in  an  image  to 
some  fixed  value,  usually  1.  This  is  discussed  at  some  length  in  Section  V. 

Vector  Space  -  In  this  paper  only  concrete  vector  spaces  over  the  reals  are 
needed,  i.e.,  only  Rn,  the  set  of  ordered  n-tuples  of  real  numbers.  A  typical 
vector  is  of  the  form  a  =  (a1(...,an)  where  each  a^  is  a  real  number.  Here  n 
may  be  much  larger  than  2  or  3. 

Convex  Function  -  A  real  valued  function  f  on  Rn  is  said  to  be  convex  if 

f(ta  +  (1  -  t)b)  <  tf(a)  +  (1  -  t)f(b)  for  all  vectors  a  *  (aj . a^)  and  b  = 

(bj . b^)  in  Rn  and  all  real  numbers  0  <  t  <  1.  A  convex  function  has  the 

property  that  a  local  minimum  is  a  global  minimum. 

Inner  Product  -  If  a  =  (aj,...,a  )  and  b  =  ( , . . . , bn )  are  two  vectors,  then 
their  inner  product,  <a,b>,  is  defined  to  be  <a,b>  =  a  b  +  ...  +  a  b  . 
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-  If  a  -  (aj.,.,8  )  is  a  vector,  its  length.  J|aj|,  is 


defined  to  be  | |a| |  =  <a,a>' 


Angle  Between  Two  Vectors  -  If  a  =  (a  . a  )  and  b  =  ( b  . b)  are  two 

in  in 

nonzero  vectors,  the  angle  between  them  is  defined  to  be  the  unique  angle  9 
between  0  and  s  which  satisfies  <a,b>  =*  ||a||*||b||  cos (8).  The  Schwarz 
inequality  guarantees  that  9  exists. 


Orthogonal  Vectors  -  In  view  of  the  previous  definition,  it  is  natural  to  say 

that  two  vectors  a  =  (a . a  )  and  b  *  (b . b  )  are  orthogonal  if  the 

in  in 

angle  between  them  is  n/2,  i.e.,  if  <a,b>  *  0. 


Orthocomolement  -  If  S  is  any  nonempty  collection  of  vectors,  then  the 
orthocomplement  of  S,  denoted  S  ,  is  the  set  of  all  vectors  which  are 
orthogonal  to  every  vector  in  S.  is  always  a  linear  subspace  of  Rn. 

Orthogonal  Projection  -  If  S  is  any  linear  subspace  of  Rn,  then  the  orthogonal 

projection  onto  S  is  the  operator  P  which  carries  any  vector  a  to  P  (a),  that 

o  S 

unique  element  in  S  which  is  closest  to  a.  P  ( a )  always  exists  and  P  is  a 

o  O 

linear  operator. 


-  See  Section  II  for  a  detailed 


description  of  this  object. 
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